Replication of Checkpoints in Recoverable DSM Systems

نویسندگان

  • Jerzy Brzezinski
  • Michal Szychowiak
چکیده

This paper presents a new technique of recovery for object-based Distributed Shared Memory (DSM) systems. The new technique, integrated with a coherence protocol for atomic consistency model, offers high availability of shared objects in spite of multiple node and communication failures, introducing little overhead. It ensures fast recovery in case of multiple node failures and enables a DSM system to circumvent the network partitioning, as far as a majority partition can be constituted.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Replication for Efficiency and Fault Tolerance in a Dsm System

Distributed Shared Memory (DSM) systems implemented on a network of workstations (NOW) have become a convenient alternative to shared memory archi-tectures to execute long running parallel applications. However, such architectures are susceptible to experience failures. This paper presents the design and implementation of a recoverable DSM (RDSM) based on a backward error recovery (BER) mechani...

متن کامل

A Recoverable Distributed Shared Memory Integrating Coherence and Recoverability

Large-scale distributed systems are very attractive for the execution of parallel applications requiring a huge computing power. However, their high probability of site failure is unacceptable, especially for long time running applications. In this paper, we address this problem and propose a checkpointing mechanism relying on a recoverable distributed shared memory (DSM). Although most recover...

متن کامل

Icare: Combining Efficiency and High-availability in a Dsm System

In light of the increasing throughput of local area networks, Networks Of Workstations (NOW) which provide a distributed shared memory (DSM) have become a convenient alternative to parallel architectures in the framework of parallel scientific applications. ICARE is a recoverable DSM based on backward error recovery which is implemented on top of an experiments ATM platform running the CHORUS m...

متن کامل

An Extended Coherence Protocol for Recoverable DSM Systems with Causal Consistency

This paper presents a coherence protocol for recoverable Distributed Shared Memory (DSM) systems with causally consistent read-write objects. It uses independent checkpointing tightly integrated with coherence operations. That integration results in high availability of shared objects and ensures fast restoration of the consistent state of DSM in spite of multiple node failures, introducing lit...

متن کامل

Recoverable Distributed Shared Memory Using the Competitive Update Protocol

In this paper, we propose a recoverable DSM that uses a competitive update protocol. In this update protocol, multiple copies of each page may be maintainedat different nodes. However, it is also possible fora page to exist in only one node, as some copies of the page may be invalidated. We propose an implementation that makes the competitive update protocol recoverable from a single node failu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003